Goto

Collaborating Authors

 ill-conditioned generalized self-concordant loss


Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing Systems

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems. Second, in the non parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nystr\om projections techniques, and prove that it achieves optimal generalization bounds with a time complexity of order O(n df), a memory complexity of order O(df^2) and no dependence on the condition number, generalizing the results known for least squares regression. Here n is the number of observations and df is the associated degrees of freedom. In particular, this is the first large-scale algorithm to solve logistic and softmax regressions in the non-parametric setting with large condition numbers and theoretical guarantees.


Reviews: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing Systems

This is obviously intended to be fleshed-out in Section 2, but even there, the differences between the proposal and the references are not explicit. For example, I'm not sure how this paper differs from prior generalized-self-concordant work (e.g.


Reviews: Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing Systems

The paper studies large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, in particular in ill-conditioned settings, providing new optimal generalization bounds and proofs of convergence. The reviewers found the contributions of high quality and were satisfied with the clarifications provided by the author response.


Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Neural Information Processing Systems

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems. Second, in the non parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nystr\"om projections techniques, and prove that it achieves optimal generalization bounds with a time complexity of order O(n df), a memory complexity of order O(df 2) and no dependence on the condition number, generalizing the results known for least squares regression. Here n is the number of observations and df is the associated degrees of freedom.


Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Marteau-Ferey, Ulysse, Bach, Francis, Rudi, Alessandro

Neural Information Processing Systems

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems. Second, in the non parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nystr\"om projections techniques, and prove that it achieves optimal generalization bounds with a time complexity of order O(n df), a memory complexity of order O(df 2) and no dependence on the condition number, generalizing the results known for least squares regression. Here n is the number of observations and df is the associated degrees of freedom.